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Abstract 

In this paper, we address the problem of dense 3D reconstruction from 
multiple view images subject to strong lighting variations. In this regard, 
a new piecewise framework is proposed to explicitly take into account the 
change of illumination across several wide-baseline images. Unlike multi¬ 
view stereo and multi-view photometric stereo methods, this pipeline deals 
with wide-baseline images that are uncalibrated, in terms of both camera 
parameters and lighting conditions. Such a scenario is meant to avoid 
use of any specific imaging setup and provide a tool for normal users 
without any expertise. To the best of our knowledge, this paper presents 
the first work that deals with such unconstrained setting. We propose 
a coarse-to-fine approach, in which a coarse mesh is first created using 
a set of geometric constraints and, then, fine details are recovered by 
exploiting photometric properties of the scene. Augmenting the fine de¬ 
tails on the coarse mesh is done via a final optimization step. Note that 
the method does not provide a generic solution for multi-view photometric 
stereo problem but it relaxes several common assumptions of this problem. 
The approach scales very well in size given its piecewise nature, dealing 
with large scale optimization and with severe missing data. Experiments 
on a benchmark dataset Robot data-set show the method performance 
against 3D ground truth. 


1 Introduction 

This paper addresses the problem of dense reconstruction from several uncali¬ 
brated multi-view images that are subject to lighting variation. Such a dense 
reconstruction pipeline is meant to deal with a off-the-shelf hardware setup— 
both in terms of image acquisition and processing- in a reasonable amount of 
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time. The proposed method is based on two fundamental reconstruction meth¬ 
ods and accordingly consists of two main stages: first, using Structure from 
Motion (SfM) to recover the coarse geometry of the scene/shape, and second 
using Photometric Stereo (PS) to recover fine surface details exploiting photo¬ 
metric properties of the surface. In this way, the method can take maximum 
advantage of geometric and photometric scene properties captured by multi-view 
and multi-illumination images. For this reason, we call our proposed pipeline as 
PiMPeR stands for Piecewise Multi-view Photo-geometric dense Reconstruction. 
A sparse 3D point cloud is obtained using SfM by tracking and matching a set of 
sparse feature points on the image sequence. This step also provides the camera 
positions, which will be used in the further steps of the pipeline. Such a 3D point 
cloud will be too sparse, because very few feature tracks will survive among the 
images under drastic variation of lighting conditions. The coarse 3D mesh is 
defined by first projecting the point cloud on one of the images and performing 
2D Delaunay triangulation, and then applying the same triangulation on the 
point cloud. Such a 3D mesh allows us to decompose the images into a set of 
image patches to simplify the establishment of multi-view correspondences and 
it further helps to assemble the fine-detailed surface patches to form a dense 
surface. Once the multi-view images are decomposed, the corresponding image 
patches are registered on a template patch and the PS is used to recover dense 
surfaces for all the patches. This requires adoption of classic PS methods and 
their reformulation in a piecewise framework, which consequently requires an 
integration step to collect all surface patches to a uniform one. 

1.1 Comparisons with related work 

Recent advancements in rigid SfM have pushed the boundaries in different as¬ 
pects, such as large scale 3D modelling in urban areas [l], real-time dense 3D 
reconstruction [5J and, most recently, chronological scene reconstruction [3j. 
These methods have either assumptions on fixed lighting conditions in real-time 
scenarios or use a tremendous amount of images to find a much smaller number 
of points that can be tracked across the images. However, in realistic imag¬ 
ing conditions, the scene illumination is usually subject to change. Moreover, 
the variation of the lighting conditions is fundamental to accurately reconstruct 
parts of the scene with homogeneous texture. Recently, Pizzoli et al. [3] and 
Engel et al. [5J showed that only high gradient textures could be used as reliable 
observations of the scene for 3D reconstruction, and that, to recover 3D position 
of the scene parts with homogeneous textures, strong prior information such as 
surface smoothness is essential. 

Differently, we present a novel method that explicitly takes into account 
changing illumination and uses them to obtain dense 3D reconstructions in an 
uncalibrated and multi-view setup. In order to tackle this problem, the classic 
formulation of PS is adapted in a piecewise framework on multi-view image 
patches registered on a template patch with a considerable amount of missing 
data. The dense matching of multi-view image patches is possible by multi¬ 
view constraints imposed by a set of sparse image points that could be reliably 
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tracked over the images. 

To this end, our approach proposes a multi-view piecewise decomposition of 
the images that permits us to solve locally both the image pixel correspondence 
and the photometric stereo problems. In particular, we are able to deal with 
the most difficult case of wide-baseline images, where given both lighting and 
geometric distortions, we have image patches with strong scale changes, self¬ 
occlusion, cast-shadows, and strong specularities. Previous approaches, instead 
strongly rely on small baselines or custom setups in order to establish reliable 
correspondences between pixels in different frames mmmm- Other geometric 
methods that do not exploit illumination variation rely on large datasets of 
images and require powerful processing resources m- In such a scenario, our 
proposed framework is a step forward towards dense reconstruction from a few 
uncalibrated images with arbitrary camera motion and lighting conditions. 

Factorization-based methods for PS have certainly provided efficient closed- 
form solutions without using specific equipment such as structured light. It is 
based on the fact that a set of images taken from a static view and subject to 
varying lights lies in a certain subspace. Hayakawa m first made evident such 
constraints assuming a Lambertian surface and a single light source. Basri et al. 
|12l used a more descriptive photometric model based on a spherical harmonic 
representation of lighting variations. These classic methods in PS, which do 
not have any depth assumption, always tie with the bas-relief ambiguity [13] • 
On the contrary, Shi et al. |14] perform an automatic radiometric calibration 
by identifying a new set of constraints that can solve for the Generalised Bas- 
Relief (GBR) transformation. Finally, Papadhimitri and Favaro [15j proposed a 
closed-form solution based on the identification of a set of local diffuse reflectance 
maxima which provides enough constraints to solve for the GBR ambiguity. 

Another viable option to obtain reliable and detailed 3D surfaces is to use a 
structured pattern for illuminating the object in a controlled environment nu. 
These active systems have led to a high number of custom solutions m that 
require a laboratory setup and accurate calibration of the devices. Hernandez 
et al. [TBj, instead, use a less restrictive calibrated setup with three not collinear 
colored lights in a dark room with surfaces that also need to be photometrically 
calibrated. Recently, Anderson et al. have extended this approach to arbitrary 
colored surfaces m- Finally, with a less constrained setup, the method of Park 
et al. ;20] considers a coarse-to-fine approach that requires only a stereo pair. 

More related to our approach, Multi-View Photometric Stereo (MVPS) use 
both geometric and photometric constraints to solve the problem in multi-view 
setups but using short baselines [5|- Similarly, Lim et al. [7] use multiple views 
to generate a coarse planar surface of the object as a 3D triangulated mesh based 
on recovered sparse 3D points. Then, they perform PS in an iterative way for 
each triangle to recover the dense surface and to align it with the recovered 
3D points. They also assume the lighting conditions to be fixed and that they 
can reconstruct the objects having small baselines only. Joshi and Kriegman |S] 
define the multi-view photometric algorithm as the optimization of multi-view 
and photometric matching costs using a graph-cut based approach. The main 
drawback of the method is an inherent fronto-parallel assumption that leads to 
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local rank constraints in the images. 

In general, a piecewise formulation helps to deal with changes of both geo¬ 
metrical and photometric components over time by simplifying the multi-view 
correspondence problem. Piecewise methods have recently demonstrated their 
strength for solving SfM problems of increasing complexity (see [21], [2?| and 

m)- 


1.2 Contribution 

Considering the problem that is addressed in this paper, one of the most chal¬ 
lenging scenarios for dense 3D reconstruction has been targeted. We do not 
use a specific imaging setup and we consider images from uncalibrated cameras 
with arbitrary lighting conditions and view-points. Unlike other methods (i.e. 
El ED]), that require many images and huge amounts of processing resources 
to create fairly dense point clouds, we use a few images and the processing 
capacities available on normal PCs to obtain dense surfaces. 

Differently from EIEIE], we deal with a wide-baseline scenario where the 
images can be taken after consistent camera motion, which is the current weak¬ 
ness of MVPS algorithms. Using a piecewise formulation allows us to solve for 
the MVPS problem using local estimates of the photometric properties of the 
shape, which results in efficiently by solving several small optimization prob¬ 
lems rather than a global one. Using piecewise photometric stereo (PPS) we 
can deal with local shape ambiguities using our local-to-global surface patches 
alignment. 

1.3 Structure of the Paper 

The rest of the paper is structured as follows. Section [2] describes PiMPeR,our 
pipeline for photo-geometric dense reconstruction. It details all the steps of 
our pipeline: the generation of a coarse mesh, the recovery of detailed surface 
patches and the specification of the optimization problem that is used to assem¬ 
ble the detailed surface patches on the coarse mesh. Experiments in Section 3 
present the 3D reconstruction obtained by our approach and, finally, Section 4 
draws the conclusions and the path for future research potentials. 

2 The PiMPeR Pipeline 

Our approach combines both multi-view and photometric constraints to obtain 
a complete 3D reconstruction of the object together with its photometric prop¬ 
erties. The complete pipeline is sketched in Figure [l] and we briefly resume 
each stage before describing them in more detail in the next sections. Note that 
the input to our algorithm is only a set of uncalibrated images, object albedo 
and lighting directions are not given as well. First we create a coarse 3D mesh 
using SfM by tracking/matching a set of feature points in the image sequence. 
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Figure 1: The PiMPeR pipeline: input sequence contains images with different 
viewpoints and arbitrary lighting; the coarse 3D mesh generated by SfM and 
then projected onto the images; the input sequence is decomposed to triangular 
image patches; triangular image patches are registered on a template triangle, 
which is the corresponding facet on the coarse mesh; detailed surface patches are 
recovered using piecewise photometric stereo; the complete surface is obtained 
by globally registering all the surface patches on the coarse mesh. 


Given the adverse lighting conditions, we account for large percentages of miss¬ 
ing data in the tracked 2D features by adopting robust SfM methods, next 
stages. Each triangle of the mesh corresponds to a set of image patches. These 
image patches are first registered to a common image template and then the 
corresponding surface patches are reconstructed using a local photometric stereo 
method. At the final stage, the geometric constraints on the surface—defined 
by the 3D mesh -are used to assemble the surface patches, taking advantage of 
the proposed photo-geometric alignment. 

2.1 Mesh Generation 

The PPS reconstruction requires an initial partition of the images into a set 
of patches. For this reason, a 3D mesh is computed by performing multi-view 
3D reconstruction from a set of 2D coordinates tracked/matched in the image 
sequence. This task can be efficiently done using a robust method for SfM that 
can cope with a large number of missing data in the measurements followed by 
a 3D mesh generation stage (see [21!). more detail, after recovering the 3D 
positions belonging to p image correspondences via a rigid SfM and storing them 
in matrix S s fm, a Delaunay triangulation is computed to generate a coarse 3D 
mesh having triangular image patches. Such a set T of m triangles is given by: 

T = {t! ... t m } , G e {1 ...p} 3 | /x = 1 ... m (1) 

where t M contains the 3 indices identifying the 3 vertices of the triangle p. Now, 
the vertices of triangle p for each image frame g , (g = 1.../), form a 2 x 3 
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matrix W gAt . Likewise, the 3D triangle vertices are stored in a 3 x 3 matrix Z M . 
This defines two sets of matrices that form the 3D mesh as S = {Z M } and the 
2D mesh at each frame g as W g = {W 5M }. 

In other words, the triangulation is done on 2D image points of an arbitrary 
view (i.e., the frontal view) and then the resulted point indices that define the 
2D triangulated mesh are used to triangulate the 3D points. 

This mesh is then used for three purposes: decomposing the multi-view 
image frames into triangular image patches, assembling the recovered triangular 
surface patches in global 3D shape, and correcting the local depth ambiguities. 


2.2 Multi-View Image Patches Decomposition 


Once the mesh is generated, we proceed on decomposition by partitioning the 
image sequences into pieces that will be reconstructed using a local photometric 
stereo algorithm. 

Consider an image as a matrix I g of size h x w. To perform the decomposi¬ 
tion, the 3D mesh S is projected onto the image frames {I s }, g £ [1... /], thus, 
providing m x / triangles. A triangular patch is represented as a set of h x w 
binary matrices B 5M . Using such binary masks, pixels included in the triangle 
g can be extracted from each image frame. In order to ease the following steps 


of 3D patch alignment (Sec. 2.5), we enlarge the binary masks having a set of 


overlapping pixels with their immediate neighbouring patches. 

Let be the projection of triangle t M on image frame g and consider 
the enlargement of triangle such that the new 2D vertices on the image 
plane define a bigger triangle in which the new edges are parallel to the original 
ones. This enlarged triangle is defined as with 2D vertices denoted by W gAl . 
The conversion from 2D vertices W g/i to an image binary mask is made using 
the three barycentric coordinates /3^ and 7 ^. Under this coordinate 
system, a pixel k that belongs to triangle t M projected on frame l g must satisfy 
the following condition: 


aS’+^S’+rS’" 1 . (2) 

where 

k€{l,...,b} , lC gii = {k : aS ) +#+7£ ) = l} (3) 

with being the set of indices for pixels belonging to triangle on frame g. 
Using this formalization, the binary mask for the projection of triangle t^ on 
frame g is defined as a h x w matrix: 
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So, every image patch is given as: 

Jgg = b sa* ® *-gi 


(4) 

(5) 
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where operator © stands for the element-wise multiplication. 


2.3 Multi-View Image Pixel Registration 

Each triangle on the coarse 3D mesh corresponds to a set of image patches that 
are obtained by projecting that particular triangle on the images. Given the 
camera (or object) motion, a set of image patches with different shapes and 
sizes will be obtained for every triangular mesh facet. This fact introduces a 
challenge, because pixel-to-pixel image patch correspondence has to be solved. 
Moreover, since the patches may consistently vary their sizes, some patches 
might have missing pixels with respect to their corresponding reference patch. 

For this reason we need a registration process that aligns all the correspond¬ 
ing image patches in different views. To that end, for each triangle f „ we define a 
template and register all image patches to their corresponding templates. Con¬ 
sider as a 3 x 3 matrix including the 3D coordinates of enlarged triangle f ;t 
from the coarse 3D mesh S. Such 3D triangles have a normal aligned to 
be fronto-parallel to the image plane [25] (i.e., aligned to N/ = [0 0 1] T ). 
The template triangle is obtained by estimating a rotation matrix R^, that 
transforms the corresponding mesh facet and then projects the rotated triangle 
on an arbitrary image plane, such that: 


= 


1 0 0 
0 1 0 


A 4 ' 


( 6 ) 


Hence, is a 2 x 3 matrix containing the 3 coordinates of the template triangle 
as: 


= [q M i q ^2 q M 3] ■ (7) 

A practical solution for registering multi-view triangular image patches to 
the corresponding templates is given by the barycentric coordinate system. Con¬ 
sider pixel k from image frame I g belonging to triangle This pixel coordinate 
can be determined as a 3-element barycentric coordinate: 



(m) M O) 

u gk k’gk Igk 


1 T 


( 8 ) 


In this way, every pixel k belonging to triangle in all image frames can 
be mapped to a single position (with respect to barycentric coordinates) in the 
template triangle given by Q p . This piecewise planar mapping gives the pixel- 
to-pixel correspondences across all the views. Notice that, depending on the 
view, the number of pixels mapped into the template may be smaller than the 
overall size of the template triangle. This might introduce a consistent number 
of missing pixels to the registered triangles. Fig.[2]shows the missing pixel values 
for different registered image patches as green points. After registering all the 
triangles t^ from image frames 1... / to the corresponding template, we can 
form the multi-view intensity matrix such as: 
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Figure 2: Multi-view patches in different frames are projected to the corre¬ 
sponding template creating a stack of images where pixel correspondences are 
given. Green pixels show the missing values after registering multi-view image 
patches to the template triangle. 


Jm = 


vec(3 1 ^ 


_vec{ J/ m ) t J f 


(9) 


fxb 


where vector vec( J gM ) (g = 1... /) is the column-wise vectorization of registered 
image patch J gM . Note that, registered image patches, i.e., J g/J , are h x w 
matrices with non-zero values, where is the set of pixel indices in J g/Jj 
that have non-zero values and represent the registered triangle t g (similar to 
Eq. (§). 


2.4 Surface Patch Reconstruction 


After registering all the corresponding multi-view image patches, it is possible 
to compute the photometric parameters (albedo, surface normals, and lighting 
directions) from the matrix for each multi-view image patch. A grayscale 
pixel Xgi £ J g (pixel i from frame g belonging to triangular patch g) can be 
defined in terms of a normal to the surface nthe albedo p^i at that pixel 
position and the lighting direction l /tg such that: 


x (m) = 1 r n 
x gi 1 fig r 




( 10 ) 


where 1 gg £ 


P[ii € 


with the non-linear constraint nE n M j = 1. 


Rewriting Eq. (10) for all the pixels in patch g and all the frames, according to 
[til yields: 
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where the / x 4 matrix contains the collection of lighting directions for / 
frames, while the 4 x b matrix contains the surface normal vectors and albedo 
values. 

However, the previous multi-view registration step has produced a matrix 

with missing values and thus any closed-form solutions such as JT3J cannot 
be applied to reconstructing the surface patch. To deal with this kind of missing 
data, we first define a binary mask for each patch /i as a matrix D M , containing 
zeros for the elements corresponding to missing values. As an additional source 
of missing data, we also consider extremely dark and saturated pixels, since 
they do not provide relevant information about the photometric properties of 
the shape. Thus, the estimation problem can be formalized as an optimization 
that solves for photometric properties of a single patch p such that: 


where 


min LMiN(i ||D m © (J m — N M )|| 2 , 

subject to 


N={p[l n T ] T 


K, n€l 3 , n'n = l) 


( 12 ) 


is the photometric manifold representing the non-linear constraint given by the 
fst orc [ er spherical harmonic approximation, as in Eq. (101. This bilinear op¬ 
timization problem with manifold constraints can be solved using a general 
purpose solver, such as BALM [2S]. Notice that we solve it for the surface 
normals associated to each pixel but not for the overall 3D surface. Therefore, 
a further integration step is required to recover the final 3D surface from the 
surface normals m- This step provides the 3-vector s ^ related to pixel x[ ll> 
that represents the 3D surface point i belonging to triangle p. 

Considering all triangular patches, m small PS problems have to be solved. 
This piecewise formulation for photometric stereo entails several advantages 
with respect to previous global approaches. Each patch has been associated 
with an individual photometric model rather than a single global model, as used 
in classical approaches. In this way, the piecewise formulation may grasp more 
complex lighting effects. Moreover, it is intrinsically more efficient and highly 
parallelizable, since reconstructing the surface in patches is computationally 
faster than reconstructing the global surface. 

The next Section presents the final step of the pipeline and shows how to 
re-assemble the local patches to a global reference system given the coarse 3D 
mesh. 


2.5 Surface Patch Alignment 

A global and dense surface can be obtained only by registering each recon¬ 
structed surface patch to the corresponding position on the coarse 3D mesh S. 
Such mapping is possible through the inverse of transformations in Eq. § 
to align all surface patches back to the 3D mesh. In this way, we can map image 
pixels belonging to the registered triangle patch to the corresponding facet 
of the 3D mesh S as below: 
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( 13 ) 



where vectors x M and y M include x and y components of the pixels belong¬ 
ing to registered image patch J M , and P M is a matrix containing points of the 
corresponding facet of the 3D mesh. 

Note that the transformed triangular plane is larger than the corresponding 
facet since we map the projection of enlarged triangles to the 3D mesh. These 
additional pixels help us to find the photo-geometric transformation since they 
are overlapping points along the borders of proximate triangular patches. These 
overlapping points should be aligned with the corresponding ones on the neigh¬ 
bouring surface patches. This constraint is used to obtain the final surface. We 
can represent the surface patch on the 3D mesh as transformed image points 
from triangular image patch that are elevated along the facet normal 
with respect to the recovered surface patch S^, where 


S M — [s Ml ... s M | /C(1 |] 3x|JC(i| , (14) 

accumulates |/C M | 3D vectors representing points on the surface of triangular 
patch t 

Therefore, the relation between the overlapping points for surface patches 
belonging to neighboring triangles t M and iy can be described considering that 
the corresponding points from two surface patches should be identical on the 
global surface S. This relation is represented as: 


6 ( c ) 




0 3 0 3 N? 


J 3x3 


• . S^ c) 


0 3 0 3 


3x3 




(15) 


where represents the 
corresponding point 
rewrite Eq. (15) as: 


common points between and P /; /, S^ 


-- r- r fj, anu r M /, give the 

corresponding points between and S ^ and O 3 is a vector of 3 zeros. We can 


vec 


vec 


(p^ c) ) + f(s^) T ® 
( P ?) + (( § ?) T ® 


0 3 0 3 Ny 

O3 0 3 N'/ 


• vec( H m ) = 
. vec( H M /), 


(16) 


where operator <S) is the Kronecker product. 


Writing Eq. (16) for all the points overlapping between every pair of tri¬ 


angles, the problem of estimating matrices and H^/ can be formalized as an 
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optimization problem: 


mm 

h m>V 



’ (H/i) — 


•(V) + 



2 


( 17 ) 


k = 


1 


where G M and C fl / denote the surface curvatures for triangles and t fl /. The 
curvature term k prevents the solutions for and H M ' from being chosen in a way 
that flattens the surface patches S^. Solving Eq. for the transformations 
and H ; ,/ with Least Squares aligns all the surface patches by solving their 
local bas-relief ambiguity up to an arbitrary reduced local ambiguity. 

The global dense surface S is obtained after variational refinement of the 
surface through minimizing the energy function: 


E(S) = 


A(l-G(s)) 


2 

+ G(s) 




2 



(18) 


where s is a surface point, (J is the geometric superposition of surface patches, 
and A > 0 is a parameter to control the smoothness. G(.) is the weighting 
function 

G(s) = ^\\a* - p% + ||a s - 7 S || 2 + || 7 S - p\\ 2 , (19) 

where a s , /3 s and 7 s are the barycentric coordinates of 3D point s with respect 
to the corresponding triangle. 


3 Experiments 

The proposed 3D reconstruction pipeline was tested on several sequences from 
RobotDataset ESQ The main feature of this dataset is the possibility to have 
a dense 3D ground truth obtained by structured light. This represents a step 
forward with respect to the previous evaluation protocols where experimental re¬ 
sults were mostly assessed on a qualitative basis. In terms of PS reconstruction, 
the sequences offer several challenges: i) the setup uses LED lights for illumina¬ 
tion and it is completely uncalibrated; ii) strong shadows affect the images; iii) 
the general object’s surface properties depart from a Lambertian model, iv) the 
corresponding PS problem is also large scale in its nature. Each single image 
has a frame size of 1600 x 1200 pixels, therefore, each image sequence has a 
number of elements (pixels) in the order of 10 9 . 

J http://roboimagedata.imm.dtu.dk 
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Comparison with previous approaches Given a test sequence in the dataset, 
we select 30 frames with both moving camera and varying light sources. The 
camera motion on the arc gives a variation of 50 degrees overall. In addition 
to the wide base-line, the images are high resolution and present strong shad¬ 
ows and reflection effects. Figure [3] shows the results for three sequences in the 
dataset (Mug, Jar and Ball) compared with a classical MVS (i.e., Furukawa 
and Ponce [23), and the MVPS approach of Lim et al. [7;. 

The Mug sequence has a glossy surface with a high reflectance but it is 
highly textured. Such properties make it challenging for standard PS but the 
texture is an advantage for MVS methods. In particular, this sequence shows 
that our approach can provide results comparable with MVS methods. The 
Ball sequence has strong cast shadows and complex bending, which introduce 
interesting challenges for MVS methods. Although the surface is not reflective 
and can be considered as being almost Lambertian, the extreme lighting condi¬ 
tions give also a challenge to standard PS approaches. The Jar sequence also 
has a glossy surface with many saturated pixels which make the photometric 
reconstruction very challenging. Moreover, the smooth convexity and lack of 
sufficient texture introduce difficulties for MVS methods. 

Multi-view photometric stereo approaches have the disadvantage of requiring 
short baseline images. As a practical example, we have tested the method by 
Lim et al. S7|, which has similarities with our approach. We have noticed that 
the main problem affecting this approach is the difficulty in finding correct pixel- 
to-pixel correspondences when the baseline between views increases. In most of 
the tested cases, the recovered shape by Lim et al 17] is almost flat with some 
bumps and few spikes (Figure [3]- the 3rd column). 

In contrast, multi-view stereo approaches can deal with wide baseline images 
but the reconstruction is not dense enough to include details of the surface. 
The right column of Figure [3] (the 4th column) shows the reconstruction from 
Furukawa and Ponce [23. Although the overall geometry of the recovered shape 
is acceptable, the surface can only be recovered for highly-textured parts of the 
shape, like in Mug sequence. The strong shadows in the Ball sequence prevented 
this method from matching pixels on some parts of the surface, which is also 
the case in Jar sequence because of poor texture. 

Computational gain Our piecewise formulation provides an efficient solu¬ 
tion in terms of computational requests. This comes directly from considering 
several smaller PS problems instead of a single large-scale problem. In general, 
a piecewise formulation deals with multi-view patches of about 10 4 pixels while 
a single PS example might amount to 10 8 pixels approximately. To show such 
an effect, we compared the processing time and the performance for the potato 
sequence example in [23 using static view PS and an analogous sequence with 
our MVPS. The reported running time [23 is almost 22 hours and resulted in 
17.34% for mean 3D errors, instead our method achieved 11.15% 3D error in 
50.8 minutes. 

Table [I] provides some quantitative data regarding our solution. The rate of 
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Figure 3: Images from Mug , Jar and Ball sequences together with our results 
(error heatmaps and 3D surface in the 1 st and 2 nd columns) and comparisons 
with a MVPS method [7] ( 3 rd column) and a MVS method [29] (4 th column). 


missing pixel values is noticeable and the obtained mean 3D error shows that the 
PPS could handle remarkable amounts of missing data. The timing presented 
in this table includes reconstruction of all the patches taking into account for 
the photo-geometric alignment. 
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Table 1: Quantitative data regarding our solution fo different sequences. 


Sequence 

# Mesh Points 

# Triangles 

# Pixels 

% Missing 
Pixels 

Time (Min.) 

3D Error 

Mug 

115 

198 

6,689,940 

24.7% 

24.38 

7.1% 

Jar 

130 

206 

22,618,080 

57.79% 

33.14 

9.34% 

Balls 

250 

362 

42,374,490 

69.48% 

62.63 

9.69% 

Potatoes 

150 

276 

36,672,990 

62.72% 

50.8 

11.15% 


4 Conclusions 

We have presented a novel photo-geometric method for dense reconstruction 
from multi-view with arbitrary lighting condition. The approach is able to cope 
with wide-baselines images from uncalibrated cameras and explicitly utilizes 
varying lighting directions in the image sequence. This means that, only a 
few images taken by the end user are enough to be able to recover the dense 
3D surfaces. The piecewise approach is highly scalable since solving for image 
patches is computationally easier than considering whole images. This enables 
the pipeline to run on commodity PCs. Future work will be dedicated to study¬ 
ing approaches that can partition the image into a mesh while taking into ac¬ 
count the photometric and geometric properties of the shape. For instance, the 
method in |30j could be used to partition the image in more consistent patches 
allocated to different subspaces. In addition, more complex photometric models 
could be used to extract more realistic photometric attributes for the surface. 
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